SETRED: Self-training with Editing
نویسندگان
چکیده
Self-training is a semi-supervised learning algorithm in which a learner keeps on labeling unlabeled examples and retraining itself on an enlarged labeled training set. Since the self-training process may erroneously label some unlabeled examples, sometimes the learned hypothesis does not perform well. In this paper, a new algorithm named Setred is proposed, which utilizes a specific data editing method to identify and remove the mislabeled examples from the self-labeled data. In detail, in each iteration of the self-training process, the local cut edge weight statistic is used to help estimate whether a newly labeled example is reliable or not, and only the reliable self-labeled examples are used to enlarge the labeled training set. Experiments show that the introduction of data editing is beneficial, and the learned hypotheses of Setred outperform those learned by the standard self-training algorithm.
منابع مشابه
The Significance of Peer-Editing in Teaching Writing to EFL Students
This study set out to investigate the effect of peer- editing as a metacognitive strategy on the development of writing. It was hypothesized that peer-editing could be used to raise grammatical and compositional awareness of the learners. Forty pre-intermediate sophomores at Islamic Azad University-Tabriz Branch participated in the study, taking the course Writing I. To warrant the initial homo...
متن کاملEditing (Virayesh) as a Movement of Resistance During the Iran-Iraq War
The present study concerns editing of translations in Iran during the Iran-Iraq War,which in the official discourse of the country is known as the Sacred Defense. Itargues that editing, in its local sense, advocated a linguistic purism inspired by aredefined nationalism, which went hand in hand with identity politics andsnowballed into a movement of resistance.
متن کاملLanguage Adaptation for Extending Post-Editing Estimates for Closely Related Languages
This paper presents an open-source toolkit for predicting human post-editing efforts for closely related languages. At the moment, training resources for the Quality Estimation task are available for very few language directions and domains. Available resources can be expanded on the assumption that MT errors and the amount of post-editing required to correct them are comparable across related ...
متن کاملSemi-supervised multi-label image classification based on nearest neighbor editing
Semi-supervised multi-label classification has been applied to many real-world applications such as image classification, document classification and so on. In semi-supervised learning, unlabeled samples are added to the training set for enhancing the classification performance, however, noises are introduced simultaneously. In order to reduce this negative effect, the nearest neighbor data edi...
متن کاملEnhanced Texture Editing using Self Similarity
Texture mapping is an indispensable tool for achieving realism in computer graphics. Significant progress has been made in recent years with regards to the synthesis and editing of 2D texture images. However, the exploration of user control for semi-automatic texture editing remains an open area of research. We present methods that partially address the semantic and technical limitations of Sel...
متن کامل